Recently, over-height vehicle strike frequently occurs, causing great economic cost and serious safety problems. Hence, an alert system which can accurately discover any possible height limiting devices in advance is necessary to be employed in modern large or medium sized cars, such as touring cars. Detecting and estimating the height limiting devices act as the key point of a successful height limit alert system. Though there are some works research height limit estimation, existing methods are either too computational expensive or not accurate enough. In this paper, we propose a novel stereo-based pipeline named SHLE for height limit estimation. Our SHLE pipeline consists of two stages. In stage 1, a novel devices detection and tracking scheme is introduced, which accurately locate the height limit devices in the left or right image. Then, in stage 2, the depth is temporally measured, extracted and filtered to calculate the height limit device. To benchmark the height limit estimation task, we build a large-scale dataset named "Disparity Height", where stereo images, pre-computed disparities and ground-truth height limit annotations are provided. We conducted extensive experiments on "Disparity Height" and the results show that SHLE achieves an average error below than 10cm though the car is 70m away from the devices. Our method also outperforms all compared baselines and achieves state-of-the-art performance. Code is available at https://github.com/Yang-Kaixing/SHLE.
translated by 谷歌翻译
When a large language model (LLM) performs complex reasoning by chain of thought (CoT), it can be highly sensitive to individual mistakes. We have had to train verifiers to address this issue. As we all know, after human inferring a conclusion, they often check it by re-verifying it, which can avoid some mistakes. We propose a new method called self-verification that uses the conclusion of the CoT as a condition to build a new sample and asks the LLM to re-predict the original conditions which be masked. We calculate an explainable verification score based on the accuracy. This method can improve the accuracy of multiple arithmetics and logical reasoning datasets when using few-shot learning. we have demonstrated that LLMs can conduct explainable self-verification of their own conclusions and achieve competitive reasoning performance. Extensive experimentals have demonstrated that our method can help multiple large language models with self-verification can avoid interference from incorrect CoT. Code is available at \url{https://github.com/WENGSYX/Self-Verification}
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Transfer learning refers to the transfer of knowledge or information from a relevant source domain to a target domain. However, most existing transfer learning theories and algorithms focus on IID tasks, where the source/target samples are assumed to be independent and identically distributed. Very little effort is devoted to theoretically studying the knowledge transferability on non-IID tasks, e.g., cross-network mining. To bridge the gap, in this paper, we propose rigorous generalization bounds and algorithms for cross-network transfer learning from a source graph to a target graph. The crucial idea is to characterize the cross-network knowledge transferability from the perspective of the Weisfeiler-Lehman graph isomorphism test. To this end, we propose a novel Graph Subtree Discrepancy to measure the graph distribution shift between source and target graphs. Then the generalization error bounds on cross-network transfer learning, including both cross-network node classification and link prediction tasks, can be derived in terms of the source knowledge and the Graph Subtree Discrepancy across domains. This thereby motivates us to propose a generic graph adaptive network (GRADE) to minimize the distribution shift between source and target graphs for cross-network transfer learning. Experimental results verify the effectiveness and efficiency of our GRADE framework on both cross-network node classification and cross-domain recommendation tasks.
translated by 谷歌翻译
Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.
translated by 谷歌翻译
Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly.
translated by 谷歌翻译
Full-body reconstruction is a fundamental but challenging task. Owing to the lack of annotated data, the performances of existing methods are largely limited. In this paper, we propose a novel method named Full-body Reconstruction from Part Experts~(FuRPE) to tackle this issue. In FuRPE, the network is trained using pseudo labels and features generated from part-experts. An simple yet effective pseudo ground-truth selection scheme is proposed to extract high-quality pseudo labels. In this way, a large-scale of existing human body reconstruction datasets can be leveraged and contribute to the model training. In addition, an exponential moving average training strategy is introduced to train the network in a self-supervised manner, further boosting the performance of the model. Extensive experiments on several widely used datasets demonstrate the effectiveness of our method over the baseline. Our method achieves the state-of-the-art performance. Code will be publicly available for further research.
translated by 谷歌翻译
大规模的地方认可是一项基本但具有挑战性的任务,在自主驾驶和机器人技术中起着越来越重要的作用。现有的方法已经达到了可接受的良好性能,但是,其中大多数都集中精力设计精美的全球描述符学习网络结构。长期以来忽略了特征概括和描述后的特征概括和描述符的重要性。在这项工作中,我们提出了一种名为GIDP的新方法,以学习良好的初始化并引起描述符,以供大规模识别。特别是,在GIDP中分别提出了无监督的动量对比度云预处理模块和基于重新的描述符后增强模块。前者旨在在训练位置识别模型之前对Point Cloud编码网络进行良好的初始化,而后来的目标是通过推理时间重新掌握预测的全局描述符。在室内和室外数据集上进行的广泛实验表明,我们的方法可以使用简单和一般的点云编码主干来实现最先进的性能。
translated by 谷歌翻译
与大脑变化相关的阿尔茨海默氏病(AD)和轻度认知障碍(MCI)的评估仍然是一项艰巨的任务。最近的研究表明,多模式成像技术的组合可以更好地反映病理特征,并有助于更准确地诊断AD和MCI。在本文中,我们提出了一种新型的基于张量的多模式特征选择和回归方法,用于诊断和生物标志物对正常对照组的AD和MCI鉴定。具体而言,我们利用张量结构来利用多模式数据中固有的高级相关信息,并研究多线性回归模型中的张量级稀疏性。我们使用三种成像方式(VBM- MRI,FDG-PET和AV45-PET)具有疾病严重程度和认知评分的临床参数来分析ADNI数据的方法的实际优势。实验结果表明,我们提出的方法与疾病诊断的最新方法的优越性能以及疾病特异性区域和与模态相关的差异的鉴定。这项工作的代码可在https://github.com/junfish/bios22上公开获得。
translated by 谷歌翻译
在本文中,我们研究了Micro-Video平台中的对象效果建议的新主题,这对于许多实际应用(例如广告插入)来说是一项具有挑战性但重要的任务。为了避免引入由图像框架直接学习视频内容引起的背景偏见的问题,我们建议利用3D人类姿势中隐藏的有意义的肢体语言进行推荐。为此,在这项工作中,引入了一种新型的人类姿势驱动的对象效应建议网络称为poserec。 Poserec利用了3D人姿势检测的优势,并从多框架3D人姿势中学习信息进行视频项目注册,从而导致高质量的对象效应建议性能。此外,为了解决对象效应建议中存在的固有的歧义和稀疏性问题,我们进一步提出了一种新颖的物品感知的隐性原型学习模块,并提供了一种新颖的姿势感知的托管性托管性硬性阴性挖掘模块,以更好地学习姿势 - 项目。更重要的是,为了为新研究主题进行基准方法,我们构建了一个新数据集,用于对象效果建议,名为Pose-Obe。对姿势攻击的广泛实验表明,我们的方法比强基础可以取得更高的性能。
translated by 谷歌翻译